August 2023

Piping

The basic piping operator

library(magrittr)
x <- 9
# Calculate the square-root of x
sqrt(x)
## [1] 3
# Calculate it using pipes
x %>% sqrt
## [1] 3

Updating x

x <- 9
# Calculate the square-root of x and update x
x <- sqrt(x)
x
## [1] 3
# Calculate it using pipes
x <- 9
x %<>% sqrt
x
## [1] 3

real-world example

df <- read.csv("beedata.csv")

nrow(subset(df, hive==4))
## [1] 60193
df %>% subset(hive==4) %>% nrow
## [1] 60193

Exercises - Piping

# Exercises - Piping

# 1. Rewrite the following code using %>% and %<>%:

x <- 2
round(log(x))
## [1] 1
# 2. Rewrite the second line of following code:

x <- rnorm(10,100)
round(sum(sqrt(x)), 3)
## [1] 100.027

Solution 1

x <- 2
round(log(x))
## [1] 1
x <- 2
x %>% log %>% round
## [1] 1

Solution 2

# 2. Rewrite the second line of following code:

x <- rnorm(10,100)
round(sum(sqrt(x)), 3)
## [1] 99.993
x %>% sqrt %>% sum %>% round(3)
## [1] 99.993

The penguin data set

Penguins in R

https://allisonhorst.github.io/palmerpenguins/

library(palmerpenguins)
## 
## Attaching package: 'palmerpenguins'
## The following objects are masked from 'package:datasets':
## 
##     penguins, penguins_raw
head(penguins)
## # A tibble: 6 × 8
##   species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##   <fct>   <fct>              <dbl>         <dbl>             <int>       <int>
## 1 Adelie  Torgersen           39.1          18.7               181        3750
## 2 Adelie  Torgersen           39.5          17.4               186        3800
## 3 Adelie  Torgersen           40.3          18                 195        3250
## 4 Adelie  Torgersen           NA            NA                  NA          NA
## 5 Adelie  Torgersen           36.7          19.3               193        3450
## 6 Adelie  Torgersen           39.3          20.6               190        3650
## # ℹ 2 more variables: sex <fct>, year <int>

Python

https://github.com/mcnakhaee/palmerpenguins?tab=readme-ov-file

from palmerpenguins import load_penguins
## /home/diseng001/R/x86_64-pc-linux-gnu-library/4.5/reticulate/python/rpytools/loader.py:120: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81.
##   return _find_and_load(name, import_)
penguins = load_penguins()
penguins.head()
##   species     island  bill_length_mm  ...  body_mass_g     sex  year
## 0  Adelie  Torgersen            39.1  ...       3750.0    male  2007
## 1  Adelie  Torgersen            39.5  ...       3800.0  female  2007
## 2  Adelie  Torgersen            40.3  ...       3250.0  female  2007
## 3  Adelie  Torgersen             NaN  ...          NaN     NaN  2007
## 4  Adelie  Torgersen            36.7  ...       3450.0  female  2007
## 
## [5 rows x 8 columns]

Penguins

species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year
Adelie Torgersen 39.1 18.7 181 3750 male 2007
Adelie Torgersen 39.5 17.4 186 3800 female 2007
Adelie Torgersen 40.3 18.0 195 3250 female 2007
Adelie Torgersen NA NA NA NA NA 2007
Adelie Torgersen 36.7 19.3 193 3450 female 2007
Adelie Torgersen 39.3 20.6 190 3650 male 2007
Adelie Torgersen 38.9 17.8 181 3625 female 2007
Adelie Torgersen 39.2 19.6 195 4675 male 2007
Adelie Torgersen 34.1 18.1 193 3475 NA 2007
Adelie Torgersen 42.0 20.2 190 4250 NA 2007
Adelie Torgersen 37.8 17.1 186 3300 NA 2007
Adelie Torgersen 37.8 17.3 180 3700 NA 2007
Adelie Torgersen 41.1 17.6 182 3200 female 2007
Adelie Torgersen 38.6 21.2 191 3800 male 2007
Adelie Torgersen 34.6 21.1 198 4400 male 2007
Adelie Torgersen 36.6 17.8 185 3700 female 2007
Adelie Torgersen 38.7 19.0 195 3450 female 2007
Adelie Torgersen 42.5 20.7 197 4500 male 2007
Adelie Torgersen 34.4 18.4 184 3325 female 2007
Adelie Torgersen 46.0 21.5 194 4200 male 2007
Adelie Biscoe 37.8 18.3 174 3400 female 2007
Adelie Biscoe 37.7 18.7 180 3600 male 2007
Adelie Biscoe 35.9 19.2 189 3800 female 2007
Adelie Biscoe 38.2 18.1 185 3950 male 2007
Adelie Biscoe 38.8 17.2 180 3800 male 2007
Adelie Biscoe 35.3 18.9 187 3800 female 2007
Adelie Biscoe 40.6 18.6 183 3550 male 2007
Adelie Biscoe 40.5 17.9 187 3200 female 2007
Adelie Biscoe 37.9 18.6 172 3150 female 2007
Adelie Biscoe 40.5 18.9 180 3950 male 2007
Adelie Dream 39.5 16.7 178 3250 female 2007
Adelie Dream 37.2 18.1 178 3900 male 2007
Adelie Dream 39.5 17.8 188 3300 female 2007
Adelie Dream 40.9 18.9 184 3900 male 2007
Adelie Dream 36.4 17.0 195 3325 female 2007
Adelie Dream 39.2 21.1 196 4150 male 2007
Adelie Dream 38.8 20.0 190 3950 male 2007
Adelie Dream 42.2 18.5 180 3550 female 2007
Adelie Dream 37.6 19.3 181 3300 female 2007
Adelie Dream 39.8 19.1 184 4650 male 2007
Adelie Dream 36.5 18.0 182 3150 female 2007
Adelie Dream 40.8 18.4 195 3900 male 2007
Adelie Dream 36.0 18.5 186 3100 female 2007
Adelie Dream 44.1 19.7 196 4400 male 2007
Adelie Dream 37.0 16.9 185 3000 female 2007
Adelie Dream 39.6 18.8 190 4600 male 2007
Adelie Dream 41.1 19.0 182 3425 male 2007
Adelie Dream 37.5 18.9 179 2975 NA 2007
Adelie Dream 36.0 17.9 190 3450 female 2007
Adelie Dream 42.3 21.2 191 4150 male 2007
Adelie Biscoe 39.6 17.7 186 3500 female 2008
Adelie Biscoe 40.1 18.9 188 4300 male 2008
Adelie Biscoe 35.0 17.9 190 3450 female 2008
Adelie Biscoe 42.0 19.5 200 4050 male 2008
Adelie Biscoe 34.5 18.1 187 2900 female 2008
Adelie Biscoe 41.4 18.6 191 3700 male 2008
Adelie Biscoe 39.0 17.5 186 3550 female 2008
Adelie Biscoe 40.6 18.8 193 3800 male 2008
Adelie Biscoe 36.5 16.6 181 2850 female 2008
Adelie Biscoe 37.6 19.1 194 3750 male 2008
Adelie Biscoe 35.7 16.9 185 3150 female 2008
Adelie Biscoe 41.3 21.1 195 4400 male 2008
Adelie Biscoe 37.6 17.0 185 3600 female 2008
Adelie Biscoe 41.1 18.2 192 4050 male 2008
Adelie Biscoe 36.4 17.1 184 2850 female 2008
Adelie Biscoe 41.6 18.0 192 3950 male 2008
Adelie Biscoe 35.5 16.2 195 3350 female 2008
Adelie Biscoe 41.1 19.1 188 4100 male 2008
Adelie Torgersen 35.9 16.6 190 3050 female 2008
Adelie Torgersen 41.8 19.4 198 4450 male 2008
Adelie Torgersen 33.5 19.0 190 3600 female 2008
Adelie Torgersen 39.7 18.4 190 3900 male 2008
Adelie Torgersen 39.6 17.2 196 3550 female 2008
Adelie Torgersen 45.8 18.9 197 4150 male 2008
Adelie Torgersen 35.5 17.5 190 3700 female 2008
Adelie Torgersen 42.8 18.5 195 4250 male 2008
Adelie Torgersen 40.9 16.8 191 3700 female 2008
Adelie Torgersen 37.2 19.4 184 3900 male 2008
Adelie Torgersen 36.2 16.1 187 3550 female 2008
Adelie Torgersen 42.1 19.1 195 4000 male 2008
Adelie Torgersen 34.6 17.2 189 3200 female 2008
Adelie Torgersen 42.9 17.6 196 4700 male 2008
Adelie Torgersen 36.7 18.8 187 3800 female 2008
Adelie Torgersen 35.1 19.4 193 4200 male 2008
Adelie Dream 37.3 17.8 191 3350 female 2008
Adelie Dream 41.3 20.3 194 3550 male 2008
Adelie Dream 36.3 19.5 190 3800 male 2008
Adelie Dream 36.9 18.6 189 3500 female 2008
Adelie Dream 38.3 19.2 189 3950 male 2008
Adelie Dream 38.9 18.8 190 3600 female 2008
Adelie Dream 35.7 18.0 202 3550 female 2008
Adelie Dream 41.1 18.1 205 4300 male 2008
Adelie Dream 34.0 17.1 185 3400 female 2008
Adelie Dream 39.6 18.1 186 4450 male 2008
Adelie Dream 36.2 17.3 187 3300 female 2008
Adelie Dream 40.8 18.9 208 4300 male 2008
Adelie Dream 38.1 18.6 190 3700 female 2008
Adelie Dream 40.3 18.5 196 4350 male 2008
Adelie Dream 33.1 16.1 178 2900 female 2008
Adelie Dream 43.2 18.5 192 4100 male 2008
Adelie Biscoe 35.0 17.9 192 3725 female 2009
Adelie Biscoe 41.0 20.0 203 4725 male 2009
Adelie Biscoe 37.7 16.0 183 3075 female 2009
Adelie Biscoe 37.8 20.0 190 4250 male 2009
Adelie Biscoe 37.9 18.6 193 2925 female 2009
Adelie Biscoe 39.7 18.9 184 3550 male 2009
Adelie Biscoe 38.6 17.2 199 3750 female 2009
Adelie Biscoe 38.2 20.0 190 3900 male 2009
Adelie Biscoe 38.1 17.0 181 3175 female 2009
Adelie Biscoe 43.2 19.0 197 4775 male 2009
Adelie Biscoe 38.1 16.5 198 3825 female 2009
Adelie Biscoe 45.6 20.3 191 4600 male 2009
Adelie Biscoe 39.7 17.7 193 3200 female 2009
Adelie Biscoe 42.2 19.5 197 4275 male 2009
Adelie Biscoe 39.6 20.7 191 3900 female 2009
Adelie Biscoe 42.7 18.3 196 4075 male 2009
Adelie Torgersen 38.6 17.0 188 2900 female 2009
Adelie Torgersen 37.3 20.5 199 3775 male 2009
Adelie Torgersen 35.7 17.0 189 3350 female 2009
Adelie Torgersen 41.1 18.6 189 3325 male 2009
Adelie Torgersen 36.2 17.2 187 3150 female 2009
Adelie Torgersen 37.7 19.8 198 3500 male 2009
Adelie Torgersen 40.2 17.0 176 3450 female 2009
Adelie Torgersen 41.4 18.5 202 3875 male 2009
Adelie Torgersen 35.2 15.9 186 3050 female 2009
Adelie Torgersen 40.6 19.0 199 4000 male 2009
Adelie Torgersen 38.8 17.6 191 3275 female 2009
Adelie Torgersen 41.5 18.3 195 4300 male 2009
Adelie Torgersen 39.0 17.1 191 3050 female 2009
Adelie Torgersen 44.1 18.0 210 4000 male 2009
Adelie Torgersen 38.5 17.9 190 3325 female 2009
Adelie Torgersen 43.1 19.2 197 3500 male 2009
Adelie Dream 36.8 18.5 193 3500 female 2009
Adelie Dream 37.5 18.5 199 4475 male 2009
Adelie Dream 38.1 17.6 187 3425 female 2009
Adelie Dream 41.1 17.5 190 3900 male 2009
Adelie Dream 35.6 17.5 191 3175 female 2009
Adelie Dream 40.2 20.1 200 3975 male 2009
Adelie Dream 37.0 16.5 185 3400 female 2009
Adelie Dream 39.7 17.9 193 4250 male 2009
Adelie Dream 40.2 17.1 193 3400 female 2009
Adelie Dream 40.6 17.2 187 3475 male 2009
Adelie Dream 32.1 15.5 188 3050 female 2009
Adelie Dream 40.7 17.0 190 3725 male 2009
Adelie Dream 37.3 16.8 192 3000 female 2009
Adelie Dream 39.0 18.7 185 3650 male 2009
Adelie Dream 39.2 18.6 190 4250 male 2009
Adelie Dream 36.6 18.4 184 3475 female 2009
Adelie Dream 36.0 17.8 195 3450 female 2009
Adelie Dream 37.8 18.1 193 3750 male 2009
Adelie Dream 36.0 17.1 187 3700 female 2009
Adelie Dream 41.5 18.5 201 4000 male 2009
Gentoo Biscoe 46.1 13.2 211 4500 female 2007
Gentoo Biscoe 50.0 16.3 230 5700 male 2007
Gentoo Biscoe 48.7 14.1 210 4450 female 2007
Gentoo Biscoe 50.0 15.2 218 5700 male 2007
Gentoo Biscoe 47.6 14.5 215 5400 male 2007
Gentoo Biscoe 46.5 13.5 210 4550 female 2007
Gentoo Biscoe 45.4 14.6 211 4800 female 2007
Gentoo Biscoe 46.7 15.3 219 5200 male 2007
Gentoo Biscoe 43.3 13.4 209 4400 female 2007
Gentoo Biscoe 46.8 15.4 215 5150 male 2007
Gentoo Biscoe 40.9 13.7 214 4650 female 2007
Gentoo Biscoe 49.0 16.1 216 5550 male 2007
Gentoo Biscoe 45.5 13.7 214 4650 female 2007
Gentoo Biscoe 48.4 14.6 213 5850 male 2007
Gentoo Biscoe 45.8 14.6 210 4200 female 2007
Gentoo Biscoe 49.3 15.7 217 5850 male 2007
Gentoo Biscoe 42.0 13.5 210 4150 female 2007
Gentoo Biscoe 49.2 15.2 221 6300 male 2007
Gentoo Biscoe 46.2 14.5 209 4800 female 2007
Gentoo Biscoe 48.7 15.1 222 5350 male 2007
Gentoo Biscoe 50.2 14.3 218 5700 male 2007
Gentoo Biscoe 45.1 14.5 215 5000 female 2007
Gentoo Biscoe 46.5 14.5 213 4400 female 2007
Gentoo Biscoe 46.3 15.8 215 5050 male 2007
Gentoo Biscoe 42.9 13.1 215 5000 female 2007
Gentoo Biscoe 46.1 15.1 215 5100 male 2007
Gentoo Biscoe 44.5 14.3 216 4100 NA 2007
Gentoo Biscoe 47.8 15.0 215 5650 male 2007
Gentoo Biscoe 48.2 14.3 210 4600 female 2007
Gentoo Biscoe 50.0 15.3 220 5550 male 2007
Gentoo Biscoe 47.3 15.3 222 5250 male 2007
Gentoo Biscoe 42.8 14.2 209 4700 female 2007
Gentoo Biscoe 45.1 14.5 207 5050 female 2007
Gentoo Biscoe 59.6 17.0 230 6050 male 2007
Gentoo Biscoe 49.1 14.8 220 5150 female 2008
Gentoo Biscoe 48.4 16.3 220 5400 male 2008
Gentoo Biscoe 42.6 13.7 213 4950 female 2008
Gentoo Biscoe 44.4 17.3 219 5250 male 2008
Gentoo Biscoe 44.0 13.6 208 4350 female 2008
Gentoo Biscoe 48.7 15.7 208 5350 male 2008
Gentoo Biscoe 42.7 13.7 208 3950 female 2008
Gentoo Biscoe 49.6 16.0 225 5700 male 2008
Gentoo Biscoe 45.3 13.7 210 4300 female 2008
Gentoo Biscoe 49.6 15.0 216 4750 male 2008
Gentoo Biscoe 50.5 15.9 222 5550 male 2008
Gentoo Biscoe 43.6 13.9 217 4900 female 2008
Gentoo Biscoe 45.5 13.9 210 4200 female 2008
Gentoo Biscoe 50.5 15.9 225 5400 male 2008
Gentoo Biscoe 44.9 13.3 213 5100 female 2008
Gentoo Biscoe 45.2 15.8 215 5300 male 2008
Gentoo Biscoe 46.6 14.2 210 4850 female 2008
Gentoo Biscoe 48.5 14.1 220 5300 male 2008
Gentoo Biscoe 45.1 14.4 210 4400 female 2008
Gentoo Biscoe 50.1 15.0 225 5000 male 2008
Gentoo Biscoe 46.5 14.4 217 4900 female 2008
Gentoo Biscoe 45.0 15.4 220 5050 male 2008
Gentoo Biscoe 43.8 13.9 208 4300 female 2008
Gentoo Biscoe 45.5 15.0 220 5000 male 2008
Gentoo Biscoe 43.2 14.5 208 4450 female 2008
Gentoo Biscoe 50.4 15.3 224 5550 male 2008
Gentoo Biscoe 45.3 13.8 208 4200 female 2008
Gentoo Biscoe 46.2 14.9 221 5300 male 2008
Gentoo Biscoe 45.7 13.9 214 4400 female 2008
Gentoo Biscoe 54.3 15.7 231 5650 male 2008
Gentoo Biscoe 45.8 14.2 219 4700 female 2008
Gentoo Biscoe 49.8 16.8 230 5700 male 2008
Gentoo Biscoe 46.2 14.4 214 4650 NA 2008
Gentoo Biscoe 49.5 16.2 229 5800 male 2008
Gentoo Biscoe 43.5 14.2 220 4700 female 2008
Gentoo Biscoe 50.7 15.0 223 5550 male 2008
Gentoo Biscoe 47.7 15.0 216 4750 female 2008
Gentoo Biscoe 46.4 15.6 221 5000 male 2008
Gentoo Biscoe 48.2 15.6 221 5100 male 2008
Gentoo Biscoe 46.5 14.8 217 5200 female 2008
Gentoo Biscoe 46.4 15.0 216 4700 female 2008
Gentoo Biscoe 48.6 16.0 230 5800 male 2008
Gentoo Biscoe 47.5 14.2 209 4600 female 2008
Gentoo Biscoe 51.1 16.3 220 6000 male 2008
Gentoo Biscoe 45.2 13.8 215 4750 female 2008
Gentoo Biscoe 45.2 16.4 223 5950 male 2008
Gentoo Biscoe 49.1 14.5 212 4625 female 2009
Gentoo Biscoe 52.5 15.6 221 5450 male 2009
Gentoo Biscoe 47.4 14.6 212 4725 female 2009
Gentoo Biscoe 50.0 15.9 224 5350 male 2009
Gentoo Biscoe 44.9 13.8 212 4750 female 2009
Gentoo Biscoe 50.8 17.3 228 5600 male 2009
Gentoo Biscoe 43.4 14.4 218 4600 female 2009
Gentoo Biscoe 51.3 14.2 218 5300 male 2009
Gentoo Biscoe 47.5 14.0 212 4875 female 2009
Gentoo Biscoe 52.1 17.0 230 5550 male 2009
Gentoo Biscoe 47.5 15.0 218 4950 female 2009
Gentoo Biscoe 52.2 17.1 228 5400 male 2009
Gentoo Biscoe 45.5 14.5 212 4750 female 2009
Gentoo Biscoe 49.5 16.1 224 5650 male 2009
Gentoo Biscoe 44.5 14.7 214 4850 female 2009
Gentoo Biscoe 50.8 15.7 226 5200 male 2009
Gentoo Biscoe 49.4 15.8 216 4925 male 2009
Gentoo Biscoe 46.9 14.6 222 4875 female 2009
Gentoo Biscoe 48.4 14.4 203 4625 female 2009
Gentoo Biscoe 51.1 16.5 225 5250 male 2009
Gentoo Biscoe 48.5 15.0 219 4850 female 2009
Gentoo Biscoe 55.9 17.0 228 5600 male 2009
Gentoo Biscoe 47.2 15.5 215 4975 female 2009
Gentoo Biscoe 49.1 15.0 228 5500 male 2009
Gentoo Biscoe 47.3 13.8 216 4725 NA 2009
Gentoo Biscoe 46.8 16.1 215 5500 male 2009
Gentoo Biscoe 41.7 14.7 210 4700 female 2009
Gentoo Biscoe 53.4 15.8 219 5500 male 2009
Gentoo Biscoe 43.3 14.0 208 4575 female 2009
Gentoo Biscoe 48.1 15.1 209 5500 male 2009
Gentoo Biscoe 50.5 15.2 216 5000 female 2009
Gentoo Biscoe 49.8 15.9 229 5950 male 2009
Gentoo Biscoe 43.5 15.2 213 4650 female 2009
Gentoo Biscoe 51.5 16.3 230 5500 male 2009
Gentoo Biscoe 46.2 14.1 217 4375 female 2009
Gentoo Biscoe 55.1 16.0 230 5850 male 2009
Gentoo Biscoe 44.5 15.7 217 4875 NA 2009
Gentoo Biscoe 48.8 16.2 222 6000 male 2009
Gentoo Biscoe 47.2 13.7 214 4925 female 2009
Gentoo Biscoe NA NA NA NA NA 2009
Gentoo Biscoe 46.8 14.3 215 4850 female 2009
Gentoo Biscoe 50.4 15.7 222 5750 male 2009
Gentoo Biscoe 45.2 14.8 212 5200 female 2009
Gentoo Biscoe 49.9 16.1 213 5400 male 2009
Chinstrap Dream 46.5 17.9 192 3500 female 2007
Chinstrap Dream 50.0 19.5 196 3900 male 2007
Chinstrap Dream 51.3 19.2 193 3650 male 2007
Chinstrap Dream 45.4 18.7 188 3525 female 2007
Chinstrap Dream 52.7 19.8 197 3725 male 2007
Chinstrap Dream 45.2 17.8 198 3950 female 2007
Chinstrap Dream 46.1 18.2 178 3250 female 2007
Chinstrap Dream 51.3 18.2 197 3750 male 2007
Chinstrap Dream 46.0 18.9 195 4150 female 2007
Chinstrap Dream 51.3 19.9 198 3700 male 2007
Chinstrap Dream 46.6 17.8 193 3800 female 2007
Chinstrap Dream 51.7 20.3 194 3775 male 2007
Chinstrap Dream 47.0 17.3 185 3700 female 2007
Chinstrap Dream 52.0 18.1 201 4050 male 2007
Chinstrap Dream 45.9 17.1 190 3575 female 2007
Chinstrap Dream 50.5 19.6 201 4050 male 2007
Chinstrap Dream 50.3 20.0 197 3300 male 2007
Chinstrap Dream 58.0 17.8 181 3700 female 2007
Chinstrap Dream 46.4 18.6 190 3450 female 2007
Chinstrap Dream 49.2 18.2 195 4400 male 2007
Chinstrap Dream 42.4 17.3 181 3600 female 2007
Chinstrap Dream 48.5 17.5 191 3400 male 2007
Chinstrap Dream 43.2 16.6 187 2900 female 2007
Chinstrap Dream 50.6 19.4 193 3800 male 2007
Chinstrap Dream 46.7 17.9 195 3300 female 2007
Chinstrap Dream 52.0 19.0 197 4150 male 2007
Chinstrap Dream 50.5 18.4 200 3400 female 2008
Chinstrap Dream 49.5 19.0 200 3800 male 2008
Chinstrap Dream 46.4 17.8 191 3700 female 2008
Chinstrap Dream 52.8 20.0 205 4550 male 2008
Chinstrap Dream 40.9 16.6 187 3200 female 2008
Chinstrap Dream 54.2 20.8 201 4300 male 2008
Chinstrap Dream 42.5 16.7 187 3350 female 2008
Chinstrap Dream 51.0 18.8 203 4100 male 2008
Chinstrap Dream 49.7 18.6 195 3600 male 2008
Chinstrap Dream 47.5 16.8 199 3900 female 2008
Chinstrap Dream 47.6 18.3 195 3850 female 2008
Chinstrap Dream 52.0 20.7 210 4800 male 2008
Chinstrap Dream 46.9 16.6 192 2700 female 2008
Chinstrap Dream 53.5 19.9 205 4500 male 2008
Chinstrap Dream 49.0 19.5 210 3950 male 2008
Chinstrap Dream 46.2 17.5 187 3650 female 2008
Chinstrap Dream 50.9 19.1 196 3550 male 2008
Chinstrap Dream 45.5 17.0 196 3500 female 2008
Chinstrap Dream 50.9 17.9 196 3675 female 2009
Chinstrap Dream 50.8 18.5 201 4450 male 2009
Chinstrap Dream 50.1 17.9 190 3400 female 2009
Chinstrap Dream 49.0 19.6 212 4300 male 2009
Chinstrap Dream 51.5 18.7 187 3250 male 2009
Chinstrap Dream 49.8 17.3 198 3675 female 2009
Chinstrap Dream 48.1 16.4 199 3325 female 2009
Chinstrap Dream 51.4 19.0 201 3950 male 2009
Chinstrap Dream 45.7 17.3 193 3600 female 2009
Chinstrap Dream 50.7 19.7 203 4050 male 2009
Chinstrap Dream 42.5 17.3 187 3350 female 2009
Chinstrap Dream 52.2 18.8 197 3450 male 2009
Chinstrap Dream 45.2 16.6 191 3250 female 2009
Chinstrap Dream 49.3 19.9 203 4050 male 2009
Chinstrap Dream 50.2 18.8 202 3800 male 2009
Chinstrap Dream 45.6 19.4 194 3525 female 2009
Chinstrap Dream 51.9 19.5 206 3950 male 2009
Chinstrap Dream 46.8 16.5 189 3650 female 2009
Chinstrap Dream 45.7 17.0 195 3650 female 2009
Chinstrap Dream 55.8 19.8 207 4000 male 2009
Chinstrap Dream 43.5 18.1 202 3400 female 2009
Chinstrap Dream 49.6 18.2 193 3775 male 2009
Chinstrap Dream 50.8 19.0 210 4100 male 2009
Chinstrap Dream 50.2 18.7 198 3775 female 2009

How are they related?

First Step: Look at your data (Descriptive Statistics)

  • What is your response variable?
  • Plot your data
    • Plot raw data
    • Frequency distribution -> Are there any patterns?

Linear Models

Definition - Simple Linear Regression

\[ \begin{equation} y_i = \beta_0 + \beta_1x_i + \epsilon_i, \qquad i=1,...,n \end{equation} \] where \(y\) is the dependent variable, \(x\) is the independent variable (also called explanatory variable), \(\beta_0\) is the intercept parameter, \(\beta_1\) is the slope parameter and \(\epsilon\sim N(0,\sigma^2)\) is the error coefficient.

Sample data

set.seed(123)
x <- seq(0,5,0.1) 
y <- x + rnorm(length(x))
plot(x, y)

Fitting a Linear Model

mod <- lm(y~x)
summary(mod)
## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.0116 -0.6110 -0.0912  0.6575  2.1444 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.05829    0.25565   0.228    0.821    
## x            0.99216    0.08812  11.259  3.4e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9263 on 49 degrees of freedom
## Multiple R-squared:  0.7212, Adjusted R-squared:  0.7155 
## F-statistic: 126.8 on 1 and 49 DF,  p-value: 3.397e-15
plot(x,y)
abline(reg = mod)

The p-value

Our hypothesis is that there is no relationship between the variables. The p-value expresses the probability of this hypothesis. If our p-value is very small, it is very unlikely, that this hypothesis is true and we have a statistical significant relationship between the variables.

Linear Model - Assumptions

  • Linear relationship between x and y
  • Independence of residuals
  • Constant variance of residuals
  • Normality: Residuals are normally distributed

Overall check for assumptions - 1

plot(mod, which=1)

Overall check for assumptions - 1

  • 1st graph: Constancy of variance
  • Plot of the residuals against the fitted values
  • Should look like the sky at night

Overall check for assumptions - 2

plot(mod, which=2)

Overall check for assumptions - 2

  • 2nd graph: Normality of errors

  • Normal QQ plot = plot of the ordered residuals against the fitted values

  • Should be a reasonably straight line

Overall check for assumptions

plot(mod, which=3)

Overall check for assumptions

plot(mod, which=4)

Assumption 1 - Linear Relationship

  • use a scatterplot to check
  • if this assumption is violated we have unrealistic predictions and estimates of the standard deviation
  • How to deal with this? You can transform the variables or alter the model.

Assumption 1 - Linear Relationship

plot(x,y)

Assumption 2 - Independence of Residuals

  • Plot residuals in time/ observation order / based on observation location / …
  • You expect that they are equally scattered around 0
  • You can also use the Durban-Watson-Test
  • Autocorrelation can have serious effects on the model
  • You can add an AR correlation term to the model. Sometimes it also helps to add a further covariate.

Assumption 2 - Independence of Residuals

res <- residuals(mod)
plot(x, res)

Durban-Watson-Test

  • H0: There is no correlation among the residuals.

  • HA: The residuals are autocorrelated.

library(car)
## Loading required package: carData
durbinWatsonTest(mod)
##  lag Autocorrelation D-W Statistic p-value
##    1      0.02431999      1.940949   0.712
##  Alternative hypothesis: rho != 0
  • The p-value is larger than 0.05, so we cannot reject the null hypothesis that there is no correlation between residuals.

Assumption 3 - Constant variance or errors

  • Plot fitted values vs residuals
  • You expect that they are equally scattered around 0

Assumption 3 - Constant variance or errors

plot(fitted(mod),res)

Assumption 4 - Normality of Residuals

  • Use qq-plot to compare quantiles.
  • Or use the Shapiro-Wilk-Test
  • If you do not find a Normal Distribution, check for outliers or transform your variable.

Assumption 4 - Normality of Residuals

qqPlot(res)

## [1] 44 18

Shapiro-Wilk-Test for Normality

  • H0: The data is normally distributed
  • HA: The data comes from an other distribution
shapiro.test(res)
## 
##  Shapiro-Wilk normality test
## 
## data:  res
## W = 0.99152, p-value = 0.9732
  • We cannot reject the null-hypothesis that the data is normally distributed.

If model doesn’t seem to fit your data…

Is your response really continuous?

If not another model might be more appropriate:

  • Continuous: linear model → LM e.g. height, weight
  • Count: Poisson model → GLM e.g. number of individuals
  • Binary: Binomial model → GLM e.g. presence / absence

Are there obvious patterns?

For example many zeros:

You could separate in two analyses: Binomial model for success / failure and linear model for successes

Are there outliers that should be omitted?

4th graph of plot(model): Plot of Cook´s distances versus raw data → Highlights the identity of particularly influential data points

Relationship not a straight line?

Polynomial regression e.g. a quadratic function might be an option

 lm(y ~ x + I(x^2))

Transformation

You could for example use a log transformation of the response variable

    lm(log(y) ~ x)

Are there other explanatory variables that you should include?

Definition - Linear Model with multiple predictors

\[\begin{equation}\label{eqn:linearregression} y_i = \beta_0 + \beta_1x_{1,i} + ... + \beta_px_{p,i} + \epsilon_i, \qquad i=1,...,n \end{equation}\] where \(y\) is the dependent variable, \(x_1 ... x_p\) are the independent variables (also called explanatory variables), \(\beta_0 ... \beta_p\) are the regression coefficients, \(\epsilon\sim N(0,\sigma^2)\) is the error coefficient and \(p \geq 1\).

Plotting

Plotting

  • many, many libraries
  • we are going to look at base, ggplot and plotly

A basic plot (Standard library)

df.4 <- df[df$hive==4,]
# plot temperature outside
plot(df.4$time, df.4$t_i_3)

Improving (Standard library)

# plot temperature outside
plot(df.4$time, df.4$t_i_3, 
     ylim=c(0,40)) # added this

Improving (Standard library)

# plot temperature outside
plot(df.4$time, df.4$t_i_3, 
     ylim = c(0,40), 
     xlab = "Time (2019)",  # added this
     ylab = "Temperature within hive", # added this
     main = "Sensor measurements") # added this

Improving (Standard library)

Improving (Standard library)

# plot temperature outside
plot(df.4$time, df.4$t_i_3, 
     ylim = c(0,40), 
     type = "b", # added this
     lty = 1, # added this
     xlab = "Time (2019)", 
     ylab = "Temperature within hive", 
     main = "Sensor measurements")

Points or Lines? (type)

  • “p”: Points
  • “l”: Lines
  • “b”: Both

Line type (lty)

Improving (Standard library)

Improving (Standard library)

df.4 <- df.4[df.4$t_i_3>5&df.4$t_i_3<40,] # added this
# plot temperature outside
plot(df.4$time, df.4$t_i_3, 
     ylim = c(0,40), 
     type = "b",
     lty = 1,
     xlab = "Time (2019)", 
     ylab = "Temperature within hive", 
     main = "Sensor measurements")

Improving (Standard library)

Improving (Standard library)

# plot temperature outside
plot(df.4$time, df.4$t_i_3, 
     ylim = c(0,40), 
     type = "b",
     lty = 1,
     pch = 4, # added this
     xlab = "Time (2019)", 
     ylab = "Temperature within hive", 
     main = "Sensor measurements")

Improving (Standard library)

Point types (pch)

Improving (Standard library)

# plot temperature outside
plot(df.4$time, df.4$t_i_3, 
     ylim = c(0,40), 
     type = "b",
     lty = 1,
     pch = 4,
     xlim = as.POSIXct(c("2019-08-08", "2019-08-09")), # added this
     xlab = "Time (2019-08-08)", 
     ylab = "Temperature within hive", 
     main = "Sensor measurements")

Improving (Standard library)

Improving (Standard library)

# plot temperature outside
plot(df.4$time, df.4$t_i_3, 
     ylim = c(0,40), 
     type = "b",
     lty = 1,
     pch = 4,
     xlim = as.POSIXct(c("2019-08-08", "2019-08-09")),
     xlab = "Time (2019-08-08)", 
     ylab = "Temperature within hive", 
     main = "Sensor measurements",
     xaxt="n")
axis.POSIXct(1, 
             at=seq(min(df.4$time), max(df.4$time), by="1 hour"), 
             format="%H:00") # added this

Improving (Standard library)

Even more complex (Standard library)

# subset data
df.4 <- df[df$hive==4,]
# plot temperature outside
plot(df.4$time, df.4$t_o, ylim=c(0,40),type = 'p', pch=4)
# choose colours
cl <- rainbow(5)
# choose colums
cols <- 4:8
# plot each column
for (i in 1:5){
    lines(df.4$time, 
          df.4[,cols[i]],
          col = cl[i],
          type = 'p', 
          pch=4, 
          ylim=c(0,40))
}
# add legend
legend("topright", legend=c(1, 2, 3, 4, 5, "outside"),
       col=c(cl, "black"), pch = 4, lty = 0, cex=0.8)

Even more complex (Standard library)

# add legend
legend("topright", legend=c(1, 2, 3, 4, 5, "outside"),
       col=c(cl, "black"), pch = 4, lty = 0, cex=0.8)

Even more complex (Standard library)

basic plot (ggplot)

# plot data
library(ggplot2)
ggplot(data = df.4, aes(x=time, y=t_i_3)) + geom_point()

improving (ggplot)

# plot data
library(ggplot2)
ggplot(data = df.4, aes(x=time, y=t_i_3)) + geom_point(shape=4) + 
  ylim(c(0, 40)) + 
  xlab("Time (2019") + 
  ylab("Temperature within hive") + 
  ggtitle("Sensor measurements")

improving (ggplot)

## Warning: Removed 152 rows containing missing values or values outside the scale range
## (`geom_point()`).

more complex (ggplot)

# subset data
df.4 <- df[df$hive==4,]
# choose columns
df.4.cols <- df.4[,c(1,4:9)]
# reshape data
library(reshape)
mdf <- melt(df.4.cols, id=c("time")) 
# plot data
library(ggplot2)
ggplot(data = mdf, aes(x=time, y=value)) + 
  geom_line(aes(colour=variable)) + 
  ylim(c(0, 40))

more complex (ggplot)

Basic plot (plotly)

library(plotly)
fig <- plot_ly(df.4[1:100,], x = ~time, y = ~t_i_3)
fig

Basic plot (plotly)

How to save plotly plots

p <- plot_ly(df.4[1:100,], x = ~time, y = ~t_i_3)
# saving the plot as html
htmlwidgets::saveWidget(p, "testploty.html")
## No trace type specified:
##   Based on info supplied, a 'scatter' trace seems appropriate.
##   Read more about this trace type -> https://plotly.com/r/reference/#scatter
## No scatter mode specifed:
##   Setting the mode to markers
##   Read more about this attribute -> https://plotly.com/r/reference/#scatter-mode
# if you want to have a static image, e.g. png, you can then open the html file, and use the small button with the camera symbol to export as png. You could also use the library orca to directly export a static image using plotly directly from your code, but it seems to be a bit complicated to install orca.
# I haven't used powerpoint for a long time, but I think there should be a way to embed a html file: https://www.techwalla.com/articles/how-to-embed-an-excel-workbook-icon-into-powerpoint

Saving your plot

png("test.png")
plot(hist(rnorm(100)))
dev.off()

Assignment

Assignment

  • you all get a confirmation of participation
  • you can submit a small assignment for 1 CP

Assignment

  • you can choose if you want to use R or Python
  • Create a markdown document/ Jupyther Notebook
  • Choose a dataset from PANGEA
  • Think about the following question: What do you want fo find out? What is the motivation for your analysis?
  • Choose 5 of the following tasks. You can also come up with own ideas.
  • Write a short text for each task explaining what you have been doing

Ideas

  • Make some exploratory analysis:
    • print the mean/ median/ standard deviation for each column
    • print the total number of missing values
    • print the number of missing values for each column
  • Transform some of the columns/ Create new columns based on existing ones
  • Create a subset of your data
  • Exclude all rows with missing values
  • Create a plot
  • Write a function that takes a filename and some additional parameters. The function should create a plot and save it as “png” with that filename
  • Create a linear model

Assignment

  • If you want to do an assignment, please send me an e-mail:

  • “I want to do an assignment.””

  • “I do not want a grade on my certificate. / I want to have a grade on my certificate./ I want to have a grade if it is better than <1.3, 1.7, 2.0, 2.3, 2.7, 3.0, …>”

Deadline

  • Suggestion for deadline: ?

Enjoy R and Python ;)